library(knitr)
library(kableExtra)

source('in/MuMAC.09Sep14.R')
source('in/MuMAC.09Sep14_distance.R')

Objectives

  1. Play with the new D-PLACE database
  2. Test a hunch about the relation of roof shape to rainfall (briefly, lots of rain negatively selects against flat roofs). Somewhat trivial, but easy to check, lots of data, and fieldwork experience in Ghana provides relevant ethnographic and historical background (including a possible account for an apparent exception).
  3. Do this while controlling for language family (Galton’s problem) and region (horizontal diffusion)

Preliminaries

Clear workspace, check for /in/ and /out/ directories (create them if needed), and load packages (install them if needed)

add_working_dir <- function(x) { if(file.exists(x)) { cat(x,"dir:",paste0(getwd(),"/",x,"/")) } else { dir.create(paste0(getwd(),"/",x)) 
  cat("subdirectory",x,"created in",getwd()) } }
add_working_dir("in")
add_working_dir("out")

list.of.packages <- c("ggplot2","tidyr","dplyr","ggthemes","lme4","extremevalues","mapproj","sp","rgeos")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)>0) install.packages(new.packages)
lapply(list.of.packages, require, character.only=T)
## Loading required package: tidyr
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
## 
##     extract
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: ggthemes
## Loading required package: lme4
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following object is masked from 'package:tidyr':
## 
##     expand
## Loading required package: extremevalues
## Loading required package: mapproj
## Loading required package: maps
## Loading required package: sp
## Loading required package: rgeos
## rgeos version: 0.4-2, (SVN revision 581)
##  GEOS runtime version: 3.6.1-CAPI-1.10.1 
##  Linking to sp version: 1.3-1 
##  Polygon checking: TRUE
rm(list.of.packages,new.packages)

Hypothesis

Main hunch: Roof type is related to precipitation such that round/sloping roofs more likely when there is much rain. Note the directionality: we expect few flat roofs in high rainfall societies, but we have no predictions for what happens in low rainfall societies. In causal terms, high rainfall selects against flat roofs, but low rainfall does not select for them.

Data

Load and pre-process D-PLACE data. Here’s a direct link to the query on D-PLACE. Attribution:

Research that uses data from D-PLACE should cite both the original source(s) of the data and the paper by Kirby et al. in which D-PLACE was first presented (e.g., research using cultural data from the Binford Hunter-Gatherer dataset: "“Binford (2001); Binford and Johnson (2006); Kirby et al. 2016).” The reference list should include the date data were accessed and URL for D-PLACE (https://d-place.org), in addition to the full references for Binford (2001), Binford and Johnson (2006), and Kirby et al. (2016)."

# Load data (skipping line 1, which is a note about attribution)

d <- read.csv(file="in/dplace-export.csv",skip=1, stringsAsFactors = FALSE)

# Rename unwieldy variable names
names(d)[names(d)=="Description..EA082.House.construction..shape.of.roof"] <- "roofdesc"
names(d)[names(d)=="Code..EA082.House.construction..shape.of.roof"] <- "roofcode"
names(d)[names(d)=="Variable..Monthly.Mean.Precipitation..mm."] <- "rainfall"
names(d)[names(d)=="Code..EA087.House.construction..secondary.house.type...shape.of.roof"] <- "roofcode.secondary"
names(d)[names(d)=="Revised.longitude"] <- "longitude"
names(d)[names(d)=="Revised.latitude"] <- "latitude"
names(d)[names(d)=="References..EA082.House.construction..shape.of.roof"] <- "references"
names(d)[names(d)=="Region.name"] <- "region"

# What's the mapping exactly?
code <- unique(d$roofcode)
desc <- unique(d$roofdesc)
roofs <- data.frame(code,desc)

# make summary variable flatroof, 1 for flat, 0 for non-flat
# we have only 95 societies with flat roofs and 1035 with non-flat
d$flatroof <- ifelse(d$roofcode == 7,1,0) # i.e. flatroof = 1 if roofcode 7, 0 otherwise
d$flatroof <- as.factor(d$flatroof)
xtabs(~ flatroof,data=d)
## flatroof
##    0    1 
## 1035   95

(There is also a bit of data for roof shape of secondary buildings, but it only covers 30% of the dataset and the biases hypothesised here are more important for main dwellings so we won’t use it here.)

First exploration

So now: how does roof flatness relate to precipitation? Comparing means, we see that the average monthly rainfall for flat roof societies is 48.1 (582 annually), while for non-flat roof societies it is twice as high at 105.8 (1270 annually). To put this in context, average annual rainfall over land on earth is 59.5 monthly (715 annually).

d %>%
  group_by(flatroof) %>%
  summarise(mean=mean(rainfall))
## # A tibble: 2 x 2
##   flatroof  mean
##   <fct>    <dbl>
## 1 0        106. 
## 2 1         48.1

Extreme values

What are the extreme values? getOutliers() computes extreme values against a baseline expectation of a normal distribution

d.outliers <- getOutliers(d$rainfall)
outlierPlot(d$rainfall, d.outliers, title="Rainfall — QQ plot with outliers\nnormal distribution, R2 = 0.9764") 

This QQ plot based on a normal distribution doesn’t look optimal: two extreme values are classified as outliers, together with the tail end of what looks like a continuous distribution. Can we do better?

Well, precipitation is probably better modelled using a Weibull distribution: after all, 0 is a hard minimum whereas there is more room for variation on the other end (cf. Wilks 1989). Whoah, that’s a pretty tight fit — R² = 0.9988!

d.outliers.weibull <- getOutliers(d$rainfall,distribution="weibull")
outlierPlot(d$rainfall, d.outliers.weibull, title="Rainfall — QQ plot with outliers\nWeibull distribution, R² = 0.9988") 

Side note: we can also plot this in ggplot2 of course — though I’m not sure how to compute the dparams for qweibull so I’ve just taken them from d.outliers.weibull

ggplot(d,aes(sample=rainfall)) +
  stat_qq(distribution=qweibull,dparams = list(shape=1.42, scale=110))

# One nice thing about ggplot is that it's trivial to group by flatroof (0/1), which already suggests different distributions

ggplot(d,aes(sample=rainfall,colour=flatroof)) +
  stat_qq(distribution=qweibull,dparams = list(shape=1.8, scale=130))

So what are the most extreme values? They have exactly the same rainfall and seem to be very close together: Saipan Island in the NW Pacific.

d.outliers.weibull$nOut # both are on the right
##  Left Right 
##     0     2
d[c(d.outliers.weibull$iRight),c("Preferred.society.name","region","rainfall","latitude","longitude")]
##     Preferred.society.name               region rainfall latitude
## 19                Chamorro Northwestern Pacific 547.9482    15.13
## 135            Carolinians Northwestern Pacific 547.9482    15.21
##     longitude
## 19     145.71
## 135    145.76

Monthly rainfall for both is recorded as 547.9482, probably taken or interpolated from the same weather station(s). The lat/long places both on Saipan Island. Annual rainfall there is highly variable, but ranges from 1900mm to 2300mm, i.e. 158-192 monthly, according to a 2004 report [1]. That is still roughly twice as high as the median, but at least it’s not six times as high. It is hard to find sources with historical precipitation data for the focal years of the Ethnographic Atlas (1930-1950), but I consider this enough reason to treat these data points as quirks to be excluded.

[1] http://www.weriguam.org/reports/item/rainfall-climatology-for-saipan-distribution-return-periods-el-nino-tropical-cyclones-and-long-term-variations.html

We create a copy of the data that excludes these extreme outliers and do the Weibull plot again as a sanity check.

d.no <- d[-d.outliers.weibull$iRight,] 

d.no.outliers <- getOutliers(d.no$rainfall,distribution="weibull")
d.no.outliers$nOut # now there are no outliers anymore
##  Left Right 
##     0     0
outlierPlot(d.no$rainfall, d.no.outliers,title="Rainfall: QQ plot with outliers\nWeibull distribution, R² = 0.9989") 

d.no %>%
  group_by(flatroof) %>%
  summarise(count=n(),mean=mean(rainfall))
## # A tibble: 2 x 3
##   flatroof count  mean
##   <fct>    <int> <dbl>
## 1 0         1033 105. 
## 2 1           95  48.1
t.test(d.no[which(d.no$flatroof==0),"rainfall"],d.no[which(d.no$flatroof==1),"rainfall"])
## 
##  Welch Two Sample t-test
## 
## data:  d.no[which(d.no$flatroof == 0), "rainfall"] and d.no[which(d.no$flatroof == 1), "rainfall"]
## t = 13.572, df = 178.13, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  48.59708 65.13317
## sample estimates:
## mean of x mean of y 
## 104.97156  48.10644

So what are the 3 societies with the highest rainfall in the flat roof category?

top3 <- d.no %>%
  filter(flatroof == 1) %>%     # filter for flat roofs
  arrange(desc(rainfall)) %>%   # arrange to sort by rainfall
  slice(1:3)                    # slice to take the first 3 rows
top3[,c("Preferred.society.name","region","rainfall","roofdesc","references")]
##   Preferred.society.name               region rainfall           roofdesc
## 1             Anak Dalam              Malesia 239.9378 Flat or horizontal
## 2                Chugach    Subarctic America 137.5110 Flat or horizontal
## 3                   Buem West Tropical Africa 105.9047 Flat or horizontal
##                                        references
## 1                  Hagen(1908); Schebesta(1928); 
## 2                            Birket-Smith(1953); 
## 3 Asmis(1911); Hinderling(1952-53); Plehn(1898);

Visualising

Plotting

Basic plots

myseed=100
set.seed(myseed)
ggplot(d.no, aes(x=flatroof,y=rainfall,fill=rainfall,colour=rainfall)) +
  theme_bw() +
  geom_jitter(alpha=0.7,shape=21,width=0.5,size=2,stroke=1) +
  scale_fill_gradient("rainfall", low="#5edcff", high="#035280") +
  scale_colour_gradient("rainfall", low="#5edcff", high="#035280") +
  stat_summary(fun.y = "median", fun.ymin = "median", fun.ymax= "median", size=0.3,width=0.33, geom = "crossbar")

ggsave(file="out/rainfall-by-rooftype.png", width=3,height=5)

# Highlight highest 3
d.no$top3 <- ifelse(d.no$Preferred.society.name %in% top3$Preferred.society.name, 1, NA)
d.no$top3 <- as.factor(d.no$top3)

set.seed(myseed)
ggplot(d.no, aes(x=flatroof,y=rainfall,fill=top3,colour=rainfall)) +
  theme_bw() +
  geom_jitter(alpha=0.7,shape=21,width=0.5,size=2,stroke=1) +
  scale_colour_gradient("rainfall", low="#5edcff", high="#035280") +
  scale_fill_manual(values=c("red"), guide = FALSE) +
  stat_summary(aes(x=flatroof,y=rainfall),inherit.aes=F,fun.y = "median", fun.ymin = "median", fun.ymax= "median", size=0.3,width=0.33, geom = "crossbar")

 ggsave(file="out/rainfall-by-rooftype-top3.png", width=3,height=5)

Maps

Maps confirm that flat roofs predominantly occur in drier places.

ggplot(d.no,aes(longitude,latitude)) + 
  theme_map() +
  borders("world", colour=NA, fill="#e7e8ea") +
  geom_point(data=filter(d.no,flatroof==0),aes(fill=rainfall,colour=flatroof),size=3,shape=21,stroke=1,alpha=0.7) +
  geom_point(data=filter(d.no,flatroof==1),aes(fill=rainfall,colour=flatroof),size=3,shape=22,stroke=1,alpha=0.7) +
  scale_colour_manual(values = c("#ffffff","brown"), guide = FALSE) +
  scale_fill_gradient("rainfall", low="#5edcff", high="#035280")

ggsave(file="out/map-rainfall-by-rooftype.png", width=12,height=8)

# map highest
last_plot() + geom_point(data=top3,aes(longitude,latitude),fill="red",colour="red",size=3,shape=22,stroke=1)

ggsave(file="out/map-rainfall-by-rooftype-highest.png", width=12,height=8)

# zoom: West Africa
last_plot() + theme_bw() # use to target desired lat/long for zooming

zoom1 <- last_plot() + borders(colour="white") + coord_map(xlim = c(-18,7), ylim = c(20,2))

zoom1 + geom_point(data=top3,aes(longitude,latitude),fill="red",colour="red",size=3,shape=22,stroke=1)

ggsave(file="out/map-rainfall-by-rooftype-west-africa.png", width=12,height=8)

Statistical modelling

So the difference in means and in distributions is suggestive. But not all observations are independent. For one thing, some languages are from the same language family and so may have inherited cultural features from a common ancestor. We use generalised linear mixed effects modelling to control for this. We add language family as a random intercept (meaning we allow the value of roof to be ‘dependent’ to some degree on language family).

We create two models: one in which we try to predict flatroof by language family alone, another where add rainfall as a fixed effect. The second one is what we’re primarily interested in. Comparing them, the one including rainfall as a fixed effect, in # addition to language family, explains the spread of data significantly better (χ2(1) = 51.331, p < 0.00001, log likelihood difference 25.67).

model1 <- glmer(flatroof ~ (1|Language.family), data=d.no,family=binomial)
summary(model1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: flatroof ~ (1 | Language.family)
##    Data: d.no
## 
##      AIC      BIC   logLik deviance df.resid 
##    536.9    547.0   -266.5    532.9     1126 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2675 -0.2849 -0.1048 -0.0136  9.5412 
## 
## Random effects:
##  Groups          Name        Variance Std.Dev.
##  Language.family (Intercept) 72.31    8.504   
## Number of obs: 1128, groups:  Language.family, 158
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -8.468      1.281  -6.608 3.89e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model2 <- glmer(flatroof ~ rainfall + (1|Language.family), data=d.no,family=binomial)
summary(model2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: flatroof ~ rainfall + (1 | Language.family)
##    Data: d.no
## 
##      AIC      BIC   logLik deviance df.resid 
##    487.6    502.7   -240.8    481.6     1125 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.7494 -0.2333 -0.0529 -0.0148 31.5177 
## 
## Random effects:
##  Groups          Name        Variance Std.Dev.
##  Language.family (Intercept) 41.17    6.416   
## Number of obs: 1128, groups:  Language.family, 158
## 
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.356594   1.599964  -3.348 0.000814 ***
## rainfall    -0.028642   0.004758  -6.020 1.74e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr)
## rainfall -0.185
comparison <- anova(model1, model2)
comparison[1,4]-comparison[2,4] # log likelihood difference
## [1] -25.66531

But the maps show that some of the flat roofs appear in ‘clumps’. Perhaps horizontal diffusion due to cultural contact can explain part of it. Best way to check this would be to compute distance to nearest flatroof neighbour (relevant code here: https://is.gd/FFLbKH), but the coordinate data in D-PLACE is rounded which makes that measure uninformative in regions with many societies. So let’s use region as a proxy. The reasoning is: if you’re in the same region, chances are you have the same kind of roof, whatever it is (i.e. a random intercept).

We compare model 2 with this new model. The new model, which includes region besides language family, explains the spread of data significantly better (χ2(1) = 56.262, p < 0.00001, log likelihood difference 28.13).

model3 <- glmer(flatroof ~ rainfall + (1|region) + (1|Language.family), data=d.no,family=binomial)
summary(model3)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: flatroof ~ rainfall + (1 | region) + (1 | Language.family)
##    Data: d.no
## 
##      AIC      BIC   logLik deviance df.resid 
##    433.3    453.4   -212.7    425.3     1124 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6833 -0.1615 -0.0559 -0.0212 16.4700 
## 
## Random effects:
##  Groups          Name        Variance Std.Dev.
##  Language.family (Intercept) 7.848    2.801   
##  region          (Intercept) 8.563    2.926   
## Number of obs: 1128, groups:  Language.family, 158; region, 50
## 
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.946465   1.478939  -2.668  0.00762 ** 
## rainfall    -0.025263   0.005173  -4.884 1.04e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr)
## rainfall -0.232
comparison <- anova(model2, model3)
comparison[1,4]-comparison[2,4] # log likelihood difference
## [1] -28.13118

Do we even need rainfall, or are region and language family sufficient to explain the distribution of flat roofs? To find out, we compare a model with just region & language family as random effects (model 4) with a model with rainfall added as a fixed effect (model 3). The fullest model, with rainfall as main predictor and region and language family as additional factors, explains the data significantly better (χ2(1) = 29.704, p < 0.00001, log likelihood difference 14.85).

model4 <- glmer(flatroof ~ (1|region) + (1|Language.family), data=d.no,family=binomial)
summary(model4)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: flatroof ~ (1 | region) + (1 | Language.family)
##    Data: d.no
## 
##      AIC      BIC   logLik deviance df.resid 
##    461.0    476.1   -227.5    455.0     1125 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3287 -0.1671 -0.0433 -0.0084  6.3584 
## 
## Random effects:
##  Groups          Name        Variance Std.Dev.
##  Language.family (Intercept) 51.18    7.154   
##  region          (Intercept) 14.31    3.784   
## Number of obs: 1128, groups:  Language.family, 158; region, 50
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -9.304      1.827  -5.093 3.52e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
comparison <- anova(model3, model4)
comparison
## Data: d.no
## Models:
## model4: flatroof ~ (1 | region) + (1 | Language.family)
## model3: flatroof ~ rainfall + (1 | region) + (1 | Language.family)
##        Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## model4  3 461.02 476.10 -227.51   455.02                             
## model3  4 433.31 453.43 -212.66   425.31 29.704      1  5.034e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
comparison[1,4]-comparison[2,4] # log likelihood difference
## [1] -14.8519

Footnote: We excluded some extreme values (which turned out to be in the flatroof=0 category). Would including them have made any difference for the results? Nope: the full model (model 6) explains the data significantly better (χ2(1) = 29.734, p < 0.00001, log likelihood difference 14.87). Excluding the extreme values was mainly useful to make the plots a little prettier.

model5 <- glmer(flatroof ~ (1|region) + (1|Language.family), data=d,family=binomial)
model6 <- glmer(flatroof ~ rainfall + (1|region) + (1|Language.family), data=d,family=binomial)
comparison <- anova(model5, model6)
comparison
## Data: d
## Models:
## model5: flatroof ~ (1 | region) + (1 | Language.family)
## model6: flatroof ~ rainfall + (1 | region) + (1 | Language.family)
##        Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## model5  3 461.05 476.14 -227.52   455.05                             
## model6  4 433.31 453.43 -212.66   425.31 29.734      1  4.957e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
comparison[1,4]-comparison[2,4] # log likelihood difference
## [1] -14.86684

Botero (2014) Nearest-neighbour approach

Here is an approach implmented first in ‘The ecology of religious beliefs’ by Carlos Botero and colleagues. This approach includes language family as a random effect (as in the model above) and estimates the effect of spatial proximity by using the average response value of N nearest neighbours as a predictor variable. The logic being if something is transmitted horizontally, neighbours have similar trait values and this will predict a particular societies value.

First, we will look at the NN approach - however, since we are using a global sample, neighbours are not always of the same distance, meaning the definition of horiztonal influence varies (e.g. Rapanui’s nearest 5 neighbours will be very different to a PNG society). So, secondly we alter this approach to take the nearest neighbours within a certain distance. This approach will be called closest neighbours.

Latitude and Longitude variables are used to calculate nearest & closest neighbours.

Nearest neighbour

The default value of this approach is to use NN = 10 which we have used here. The model here is effectively, flatroof ~ rainfall + NN Score + (1|Language Family)

d$flatroof = d$flatroof %>% 
  as.character(.) %>% 
  as.numeric(.)

mydata = d
response = "flatroof"
Predictors = "rainfall"
VerticalDependency = "Language.family"
Nneighbors = 1000

fit = MuMAC(mydata = d, 
            response = "flatroof", 
            Predictors = "rainfall", 
            VerticalDependency = "Language.family", 
            Nneighbors = 10)
## [1] "Computing spatial dependencies..."
## [1] "Estimating all possible model combinations..."
## [1] "[Please be patient]"
#fit

kable(fit$ModWeights, format = 'html', digits = 3) %>% 
  kable_styling()
Model delta AICc AICc weight
5 (Intercept) NeighborScore 0.000 0.523
1 (Intercept) NeighborScore VerticalDependency 1.912 0.201
6 (Intercept) NeighborScore rainfall 1.992 0.193
2 (Intercept) NeighborScore rainfall VerticalDependency 3.668 0.083
3 (Intercept) rainfall VerticalDependency 561.248 0.000
4 (Intercept) VerticalDependency 601.798 0.000
7 (Intercept) rainfall 665.287 0.000
8 (Intercept) 719.185 0.000
kable(fit$MM_average, format = 'html', digits = 3) %>% 
  kable_styling()
Estimate Std. Error RVI PredictiveAccuracy
(Intercept) 0.002 0.009 1.000 NA
NeighborScore 0.912 0.030 1.000 NA
rainfall 0.000 0.000 0.276 NA
VerticalDependency NA NA 0.284 NA

This approach fits all possible models, and compares them based on AICc values. The first table here shows the delta AICc and AICc weight of each model. The second table shows the beta estimates for each variable in the model and the Relative Variable Importance (RVI). RVI is calculated using the presence of variables in the better fitting models - i.e. if a variable is often in the better fitting models, then it has high RVI. Predictive accuracy is another output from this model, but it hasn’t been utilised here so is empty. These outputs are the same for the closest neighbours output.

Using the default NN value, we see the best model is one only using the Nearest neighbour score. Neighbour score also has an RVI of one. (The intercept has to be in every model). This contradicts the ideas above.

fit = MuMAC(mydata = d, 
            response = "flatroof", 
            Predictors = "rainfall", 
            VerticalDependency = "Language.family", 
            Nneighbors = 20)
## Loading required package: ROCR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'ROCR'
## [1] "Computing spatial dependencies..."
## Warning in knearneigh(xys, k = Nneighbors, longlat = TRUE): knearneigh:
## identical points found
## [1] "Estimating all possible model combinations..."
## Fixed term is "(Intercept)"
## Fixed term is "(Intercept)"
## [1] "[Please be patient]"
#fit

kable(fit$ModWeights, format = 'html', digits = 3) %>% 
  kable_styling()
Model delta AICc AICc weight
1 (Intercept) NeighborScore VerticalDependency 0.000 0.497
2 (Intercept) NeighborScore rainfall VerticalDependency 0.893 0.318
5 (Intercept) NeighborScore 2.645 0.133
6 (Intercept) NeighborScore rainfall 4.518 0.052
3 (Intercept) rainfall VerticalDependency 420.394 0.000
4 (Intercept) VerticalDependency 460.943 0.000
7 (Intercept) rainfall 524.432 0.000
8 (Intercept) 578.331 0.000
kable(fit$MM_average, format = 'html', digits = 3) %>% 
  kable_styling()
Estimate Std. Error RVI PredictiveAccuracy
(Intercept) 0.008 0.014 1.000 NA
NeighborScore 0.972 0.042 1.000 NA
rainfall 0.000 0.000 0.370 NA
VerticalDependency NA NA 0.816 NA

Using double the number of Neighbours, we see the importance of Vertical dependency increases to a much higher level. This is perhaps because NN scores now spans across language families. Rainfall however, remains relatively unimportant. The most important model now is one including NeighbourScore and the Language family random effect.

Closest neighbours

Here, I have altered the approach slightly to take the average of neighbours within a certain distance. This is currently a small hack in the code which mean Nneighbours now represents distance rather than number of neighbours. Currently I am just guesstimating the best distance, so a more methodological approach to choosing distance might be appropriate. I will start with neighbours within 1000kms.

fit = MuMAC_distance(mydata = d, 
            response = "flatroof", 
            Predictors = "rainfall", 
            VerticalDependency = "Language.family", 
            Nneighbors = 1000) 
## [1] "Computing spatial dependencies..."
## [1] "Estimating all possible model combinations..."
## [1] "[Please be patient]"
# Nneighbours now takes a distance value for neighbours to be within

kable(fit$ModWeights, format = 'html', digits = 3) %>% 
  kable_styling()
Model delta AICc AICc weight
1 (Intercept) NeighborScore rainfall VerticalDependency 0.000 0.998
2 (Intercept) NeighborScore VerticalDependency 12.196 0.002
5 (Intercept) NeighborScore rainfall 45.427 0.000
6 (Intercept) NeighborScore 53.054 0.000
3 (Intercept) rainfall VerticalDependency 191.704 0.000
4 (Intercept) VerticalDependency 232.253 0.000
7 (Intercept) rainfall 295.743 0.000
8 (Intercept) 349.641 0.000
kable(fit$MM_average, format = 'html', digits = 3) %>% 
  kable_styling()
Estimate Std. Error RVI PredictiveAccuracy
(Intercept) 0.073 0.022 1.000 NA
NeighborScore 0.689 0.047 1.000 NA
rainfall -0.001 0.000 0.998 NA
VerticalDependency NA NA 1.000 NA

When we run this approach all predictors are of basically equal importance, and therefore are all included in the best model.

If we reduce the NN distance to 500km

fit = MuMAC_distance(mydata = d, 
            response = "flatroof", 
            Predictors = "rainfall", 
            VerticalDependency = "Language.family", 
            Nneighbors = 500) 
## [1] "Computing spatial dependencies..."
## [1] "Estimating all possible model combinations..."
## [1] "[Please be patient]"
# Nneighbours now takes a distance value for neighbours to be within

kable(fit$ModWeights, format = 'html', digits = 3) %>% 
  kable_styling()
Model delta AICc AICc weight
1 (Intercept) NeighborScore rainfall VerticalDependency 0.000 0.993
2 (Intercept) NeighborScore VerticalDependency 9.870 0.007
5 (Intercept) NeighborScore rainfall 49.478 0.000
6 (Intercept) NeighborScore 64.833 0.000
3 (Intercept) rainfall VerticalDependency 223.702 0.000
4 (Intercept) VerticalDependency 264.251 0.000
7 (Intercept) rainfall 327.741 0.000
8 (Intercept) 381.639 0.000
kable(fit$MM_average, format = 'html', digits = 3) %>% 
  kable_styling()
Estimate Std. Error RVI PredictiveAccuracy
(Intercept) 0.065 0.021 1.000 NA
NeighborScore 0.554 0.035 1.000 NA
rainfall 0.000 0.000 0.993 NA
VerticalDependency NA NA 1.000 NA

What is not clear is how good this model is to the Nearest neighbour models. This will need further investigating.

Conclusions

In this dataset, rainfall is a strong predictor of roof shape, controlling for language family and region. Flat roofs are unlikely to occur in locales with a lot of precipitation. So the hypothesis is supported: rainfall selects against flat roots.

References

Botero, C. A., Gardner, B., Kirby, K. R., Bulbulia, J., Gavin, M. C., & Gray, R. D. (2014). The ecology of religious beliefs. Proceedings of the National Academy of Sciences, 111(47), 16784–16789. https://doi.org/10.1073/pnas.1408701111